Learning the Sequence Determinants of Alternative Splicing from Millions of Random Sequences
نویسندگان
چکیده
Most human transcripts are alternatively spliced, and many disease-causing mutations affect RNA splicing. Toward better modeling the sequence determinants of alternative splicing, we measured the splicing patterns of over two million (M) synthetic mini-genes, which include degenerate subsequences totaling over 100 M bases of variation. The massive size of these training data allowed us to improve upon current models of splicing, as well as to gain new mechanistic insights. Our results show that the vast majority of hexamer sequence motifs measurably influence splice site selection when positioned within alternative exons, with multiple motifs acting additively rather than cooperatively. Intriguingly, motifs that enhance (suppress) exon inclusion in alternative 5' splicing also enhance (suppress) exon inclusion in alternative 3' or cassette exon splicing, suggesting a universal mechanism for alternative exon recognition. Finally, our empirically trained models are highly predictive of the effects of naturally occurring variants on alternative splicing in vivo.
منابع مشابه
Role of Aberrant Alternative Splicing in Cancer
Alternative splicing can alter genome sequence and as a consequence, many genes change to oncogenes. This event can also affect protein function and diversity. The growing number of study elucidate the pathological influence of impaired alternative splicing events on numerous disease including cancer. Here, we would like to highlight the significant role of alternative splicing in cancer biolog...
متن کاملAn Evolutionary and Phylogenetic Study of the BMP15 Gene
DNA sequence data contains a wealth of biologically useful information. Recent innovations in DNA sequencing technology have greatly increased our capacity to determine massive amounts of nucleotide sequences. These sequences can be used to specify the characteristics of different regions, interpret the evolutionary relationships between categorized groups, likelihood of performing multiple com...
متن کاملComparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine
One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of r...
متن کاملRelation Between RNA Sequences, Structures, and Shapes via Variation Networks
Background: RNA plays key role in many aspects of biological processes and its tertiary structure is critical for its biological function. RNA secondary structure represents various significant portions of RNA tertiary structure. Since the biological function of RNA is concluded indirectly from its primary structure, it would be important to analyze the relations between the RNA sequences and t...
متن کاملVirtual genetic coding and time series analysis for alternative splicing prediction in C. elegans
MOTIVATION Prediction of alternative splicing has been traditionally based on the study of expressed sequences, helped by homology considerations and the analysis of local discriminative features. More recently, machine learning algorithms have been developed that try avoid or reduce the use of a priori information, with partial success. OBJECTIVE AND METHOD With the aim of developing a fully...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Cell
دوره 163 شماره
صفحات -
تاریخ انتشار 2015